What format are your alignments in and what do the names mean?

All our alignment files are in BAM format, a standard alignment format which was defined by the consortium and has since seen wide community adoption. We also provide our alignments in CRAM Format

The bam file names look like:

NA00000.location.platform.population.analysis_group.YYYYMMDD.bam

The bai index and bas statistics files are also named in the same way.

The name includes the individual sample ID, where the sequence is mapped to, if the file has only contains mapping to a particular chromosome that is what the name contains otherwise, mapped means the whole genome mapping and unmapped means the reads which failed to map to the reference (pairs where one mate mapped and the other didn’t stay in the mapped file), the sequencing platform, the ethnicity of the sample using our three letter population code, the sequencing strategy. The date matches the date of the sequence used to build the bams and can also be found in the sequence.index filename.

IGSR: The International Genome Sample Resource

Supporting open human variation data

Links

What format are your alignments in and what do the names mean?

Related questions: